AITopics | sign sgd

Collaborating Authors

sign sgd

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

f629ed9325990b10543ab5946c1362fb-Supplemental.pdf

Neural Information Processing SystemsFeb-19-2026, 09:11:41 GMT

nullnull 0, sign sgd, taylor expansion, (13 more...)

Neural Information Processing Systems

Country: North America > United States (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

DETOX: A Redundancy-based Framework for Faster and More Robust Gradient Aggregation

Shashank Rajput, Hongyi Wang, Zachary Charles, Dimitris Papailiopoulos

Neural Information Processing SystemsFeb-12-2026, 00:30:42 GMT

Neural Information Processing Systems http://nips.cc/

etox, gradient, node, (14 more...)

Neural Information Processing Systems

Country:

North America > United States > Wisconsin > Dane County > Madison (0.05)
North America > United States > California > Los Angeles County > Long Beach (0.04)
North America > Canada (0.04)

Industry: Information Technology > Security & Privacy (0.68)

Technology:

Information Technology > Security & Privacy (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.47)

Add feedback

DETOX: A Redundancy-based Framework for Faster and More Robust Gradient Aggregation

Shashank Rajput, Hongyi Wang, Zachary Charles, Dimitris Papailiopoulos

Neural Information Processing SystemsOct-2-2025, 14:58:38 GMT

Neural Information Processing Systems http://nips.cc/

artificial intelligence, etox, machine learning, (16 more...)

Neural Information Processing Systems

Country: North America > United States > Wisconsin (0.15)

Industry: Information Technology > Security & Privacy (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.47)

Add feedback

f629ed9325990b10543ab5946c1362fb-Supplemental.pdf

Neural Information Processing SystemsAug-17-2025, 07:57:18 GMT

artificial intelligence, machine learning, taylor expansion, (16 more...)

Neural Information Processing Systems

Country: North America > United States (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

f629ed9325990b10543ab5946c1362fb-Paper.pdf

Neural Information Processing SystemsAug-17-2025, 07:57:11 GMT

artificial intelligence, machine learning, sign sgd, (15 more...)

Neural Information Processing Systems

Country:

North America > United States > Minnesota (0.05)
North America > United States > Virginia (0.04)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
(2 more...)

Industry: Information Technology > Security & Privacy (0.93)

Technology:

Information Technology > Security & Privacy (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.46)

Add feedback

Magnitude Matters: Fixing SIGNSGD Through Magnitude-Aware Sparsification in the Presence of Data Heterogeneity

Jin, Richeng, He, Xiaofan, Zhong, Caijun, Zhang, Zhaoyang, Quek, Tony, Dai, Huaiyu

arXiv.org Artificial IntelligenceFeb-19-2023

Communication overhead has become one of the major bottlenecks in the distributed training of deep neural networks. To alleviate the concern, various gradient compression methods have been proposed, and sign-based algorithms are of surging interest. However, SIGNSGD fails to converge in the presence of data heterogeneity, which is commonly observed in the emerging federated learning (FL) paradigm. Error feedback has been proposed to address the non-convergence issue. Nonetheless, it requires the workers to locally keep track of the compression errors, which renders it not suitable for FL since the workers may not participate in the training throughout the learning process. In this paper, we propose a magnitude-driven sparsification scheme, which addresses the non-convergence issue of SIGNSGD while further improving communication efficiency. Moreover, the local update scheme is further incorporated to improve the learning performance, and the convergence of the proposed method is established. The effectiveness of the proposed scheme is validated through experiments on Fashion-MNIST, CIFAR-10, and CIFAR-100 datasets.

artificial intelligence, machine learning, sign sgd, (17 more...)

arXiv.org Artificial Intelligence

2302.09634

Country: Asia (0.28)

Genre: Research Report (0.82)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.34)

Add feedback

DETOX: A Redundancy-based Framework for Faster and More Robust Gradient Aggregation

Rajput, Shashank, Wang, Hongyi, Charles, Zachary, Papailiopoulos, Dimitris

arXiv.org Machine LearningJul-29-2019

To improve the resilience of distributed training to worst-case, or Byzantine node failures, several recent approaches have replaced gradient averaging with robust aggregation methods. Such techniques can have high computational costs, often quadratic in the number of compute nodes, and only have limited robustness guarantees. Other methods have instead used redundancy to guarantee robustness, but can only tolerate limited number of Byzantine failures. In this work, we present DETOX, a Byzantine-resilient distributed training framework that combines algorithmic redundancy with robust aggregation. DETOX operates in two steps, a filtering step that uses limited redundancy to significantly reduce the effect of Byzantine nodes, and a hierarchical aggregation step that can be used in tandem with any state-of-the-art robust aggregation method. We show theoretically that this leads to a substantial increase in robustness, and has a per iteration runtime that can be nearly linear in the number of compute nodes. We provide extensive experiments over real distributed setups across a variety of large-scale machine learning tasks, showing that DETOX leads to orders of magnitude accuracy and speedup improvements over many state-of-the-art Byzantine-resilient approaches.

artificial intelligence, gradient, machine learning, (15 more...)

arXiv.org Machine Learning

1907.12205

Country: North America > United States > Wisconsin (0.14)

Genre: Research Report (0.82)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.69)

Add feedback

signSGD with Majority Vote is Communication Efficient And Byzantine Fault Tolerant

Bernstein, Jeremy, Zhao, Jiawei, Azizzadenesheli, Kamyar, Anandkumar, Anima

arXiv.org Artificial IntelligenceOct-18-2018

Training neural networks on large datasets can be accelerated by distributing the workload over a network of machines. As datasets grow ever larger, networks of hundreds or thousands of machines become economically viable. The time cost of communicating gradients limits the effectiveness of using such large machine counts, as may the increased chance of network faults. We explore a particularly simple algorithm for robust, communication-efficient learning---signSGD. Workers transmit only the sign of their gradient vector to a server, and the overall update is decided by a majority vote. This algorithm uses $32\times$ less communication per iteration than full-precision, distributed SGD. Under natural conditions verified by experiment, we prove that signSGD converges in the large and mini-batch settings, establishing convergence for a parameter regime of Adam as a byproduct. We model adversaries as those workers who may compute a stochastic gradient estimate and manipulate it, but may not coordinate with other adversaries. Aggregating sign gradients by majority vote means that no individual worker has too much power. We prove that unlike SGD, majority vote is robust when up to 50% of workers behave adversarially. On the practical side, we built our distributed training system in Pytorch. Benchmarking against the state of the art collective communications library (NCCL), our framework---with the parameter server housed entirely on one machine---led to a 25% reduction in time for training resnet50 on Imagenet when using 15 AWS p3.2xlarge machines.

artificial intelligence, machine learning, majority vote, (16 more...)

arXiv.org Artificial Intelligence

1810.05291

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback